Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
147931
posts in
44.2
ms
Target Policy Optimization
聽
馃搻
ML Theory
arxiv.org
路
1d
Markov
Decision
Processes
: The Language of Reinforcement Learning
聽
鈾燂笍
Game Theory
medium.com
路
4d
Rethinking
Robotics Reinforcement Learning: A Practical
Humanoid
Training Workflow
聽
馃
AI Agents
semiengineering.com
路
19h
Formalizing
the "generative crash" via
inverse
reinforcement learning
聽
馃
AI Agents
news.ycombinator.com
路
2d
路
Hacker News
Reinforcement
Learning From Human Feedback (
RLHF
) in Large Language Models(LLMs)
聽
馃挰
LLMs
pub.towardsai.net
路
5d
Three Ways
Machines
Learn
聽
馃
Machine Learning
medium.com
路
3d
Continual
learning for AI agents
聽
馃
AI Agents
bestblogs.dev
路
4d
Predictive
Representations
for Skill Transfer in Reinforcement Learning
聽
馃
Machine Learning
arxiv.org
路
22h
Continual
learning for AI agents
聽
馃
AI Agents
blog.langchain.com
路
4d
路
Hacker News
Provable
Multi-Task Reinforcement Learning: A Representation Learning Framework with Low
Rank
Rewards
聽
馃
Machine Learning
arxiv.org
路
2d
Thompson Sampling for Infinite-Horizon
Discounted
Decision
Processes
聽
馃搻
ML Theory
arxiv.org
路
22h
Smart
Commander
: A Hierarchical Reinforcement Learning Framework for Fleet-Level
PHM
Decision Optimization
聽
馃搻
ML Theory
arxiv.org
路
22h
Value
Mirror
Descent
for Reinforcement Learning
聽
馃搻
ML Theory
arxiv.org
路
1d
DROP:
Distributional
and Regular Optimism and
Pessimism
for Reinforcement Learning
聽
馃幉
Probability
arxiv.org
路
22h
Hierarchical
Reinforcement Learning with Augmented Step-Level
Transitions
for LLM Agents
聽
馃
AI Agents
arxiv.org
路
1d
Reinforcement
Learning for LLM Post-Training: A
Survey
聽
馃挰
LLMs
arxiv.org
路
22h
Enhancing
sample
efficiency in reinforcement-learning-based flow control: replacing the
critic
with an adaptive reduced-order model
聽
馃
AI Agents
arxiv.org
路
1d
FP4
Explore,
BF16
Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
聽
馃搻
ML Theory
arxiv.org
路
22h
Adaptive
Incentive
Design with Regret
Minimization
聽
鈾燂笍
Game Theory
arxiv.org
路
1d
A Control
Barrier
Function-Constrained
Model Predictive Control Framework for Safe Reinforcement Learning
聽
馃
AI Agents
arxiv.org
路
22h
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help